-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sequencer)!: rewrite check_tx to be more efficient and fix regression #1515
Conversation
577d0c4
to
0de80ec
Compare
0a4f435
to
56ff638
Compare
parked.add( | ||
timemarked_tx, | ||
match parked.add( | ||
timemarked_tx.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you can do let id = timemarked_tx.id()
before this, then use that later on in the function and remove the clone()
if let Some(rsp) = insert_into_mempool(mempool, &state, signed_tx, metrics).await { | ||
return rsp; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function returning an Option
that's only set if there's an error response is somewhat confusing, would prefer for this function to return a Result<(), response::CheckTx>
so the error variant is the error response
same for check_removed_comet_bft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it, feels much better that way
/// Returns a [`response::CheckTx`] if the transaction fails any of the checks. | ||
/// Otherwise, it returns the [`SignedTransaction`] to be inserted into the mempool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use a Result<SignedTransaction, response::CheckTx>
as this function's return
I think this PR looks good, just a few smaller comments!
how does this work? with cometbft recheck? |
Hmmm. I think I misheard why we wanted to allow rechecks to add to the appside mempool. I checked out the CometBFT code and didn't see it persisting the transactions anywhere that was accessible, (I wonder if I missed it). In the scenario where CometBFT doesn't persist, then the allowing recheck would only be useful if CometBFT doesn't crash if the sequencer app crashses. I'll do some more research |
345fcab
to
af24538
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm blocking this so I can have a second pass over this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have gone over the PR and left some comments. Reading the PR description this patch attempts to tackle 3 things at the same time:
-
avoid rerunning expensive checks that have already been performed once. I believe this new functionality is covered by the changes in
crate::service::mempool
. It is enabled by the functionscheck_removed_comet_bft
andis_tracked
. Theis_tracked
makes use of the newcontained_txs
hashset. Question: is this an optimization to avoid usingpending
andparked
mempool fields? Clearly the mempool needed to track the transactions contained therein before - so what's the reason to have an extra index of transactions? It would be nice to see that explained in the patch description. It would furthermore be useful to understand how this new index and the previous caches will be kept in sync. -
fix a regression for
CheckTxKind::Recheck
. Grepping this PR I only findrecheck
mention in tests. Looking at thefn handle_check_tx
function therequest::CheckTx
is immediately deconstructed and theCheckTxKind::Recheck
never actively considered. It hence looks to me as if we don't discriminate between checks of new transactions and rechecked of transactions existing in the mempool. Given that this is a regression called out in both the patch title and text, it would be nice to have this explained. -
introduce a way to maintain state on sequencer restarts. This part seems to be completely orthogonal to the goals 1. and 2. All related code should be removed from this patch and moved into a new PR. Having said that, it seems that the index of checked transactions (the hashset mentioned in the PR description) was introduced to enable the main goal of the PR (namely skipping unnecessary checks summarized under 1.). Enabling restart logic thus seems to be a useful side effect? Either way, it is confusing to the reviewer in how far this should be made part of the patch, and why it wasn't moved to a new patch. The patch description should be amended accordingly and better explained.
The semantics of adding or removing transactions from the mempool-tracked txs is weird. Looking at the code that makes use of add_from_contained_txs
and remove_from_contained_txs
it looks like you would be better served by creating a custom guard like this (I have intentionally left out the metrics stuff to keep the example minimal; add them as necessary):
struct TxLock<'a> {
mempool: &'a Mempool,
txs: RwLockWriteGuard<'a, HashSet<[u8; 32]>>,
}
impl<'a> TxLock<'a> {
fn add(&mut self, id: [u8; 32]) {
self.tx.insert(id);
}
fn remove(&mut self, id: [u8; 32]) {
self.tx.remove(id);
}
}
impl Mempool {
async fn lock_txs(&self) -> TxLock<'_> {
TxLock {
mempool: self,
txs: mempool.contained_txs.write().await,
}
}
}
|
||
/// Removes a transaction from the mempool's tracked transactions. | ||
/// Will increment logic error metrics and log error if transaction is not present. | ||
fn remove_from_contained_txs(&self, tx_hash: [u8; 32], contained_txs: &mut HashSet<[u8; 32]>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My arguments for the mempool bits above also apply here: this is not removing a transactions from the mempool tracked transactions. It's mutating an argument. This method seems to be a vehicle to access the metrics.
Change or remove please.
} | ||
|
||
#[tokio::test] | ||
async fn receck_adds_non_tracked_tx() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still a spelling issue here.
@@ -1,3 +1,6 @@ | |||
#[cfg(test)] | |||
mod tests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually move this below the imports please. We are following the official Rust style guide: https://doc.rust-lang.org/stable/style-guide/items.html
|
||
// tx is valid, push to mempool with current state | ||
// generate address for the signed transaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This address construction is not necessary I believe (but fixing it is the scope of this PR).
Please remove this comment and create a followup issue to just us the address bytes to fetch all relevant information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked: #1620
} | ||
RemovalReason::LowerNonceInvalidated => { | ||
return Err(response::CheckTx { | ||
code: Code::Err(AbciErrorCode::LOWER_NONCE_INVALIDATED.value()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent behind info
was to give a human readable description of the abci code. So please use AbciErrorCode::LOWER_NONCE_INVALIDATED.to_string()
or, if you need a more granular reason, add a new abci code.
Alternatively, provide more background information to the log
line.
info: "transaction removed from app mempool due to lower nonce being \ | ||
invalidated" | ||
.into(), | ||
log: "Transaction removed from app mempool due to lower nonce being \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lower case. Also, log and info seem to provide the same information. That doesn't seem useful.
return Err(response::CheckTx { | ||
code: Code::Err(AbciErrorCode::INVALID_NONCE.value()), | ||
info: "transaction removed from app mempool due to stale nonce".into(), | ||
log: "Transaction from app mempool due to stale nonce".into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lower case - also info and log seem to have the same text. Doesn't seem useful.
if let Err(error) = | ||
pending.add(ttx, current_account_nonce, ¤t_account_balances) | ||
{ | ||
// remove from tracked | ||
// note: this branch is not expected to be hit so grabbing the lock inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convention is to use upper case XXX
, TODO
, FIXME
, NOTE
etc for comments like this.
// note: this branch is not expected to be hit so grabbing the lock inside | |
// NOTE: this branch is not expected to be hit so grabbing the lock inside |
if let Err(error) = | ||
pending.add(ttx, current_account_nonce, ¤t_account_balances) | ||
{ | ||
// remove from tracked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If code doing something as simple as this needs to be commented, then its either not self evident (leave a comment tagged with XXX
or NOTE
, see below), or adjust the called methods to make it evident.
tx_hash: [u8; 32], | ||
mempool: &AppMempool, | ||
metrics: &Metrics, | ||
) -> Result<(), response::CheckTx> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just return a RemovalReason
here and instead implement a new trait IntoCheckTxResponse
on it so that you don't need to do the conversion inline.
state: &S, | ||
signed_tx: SignedTransaction, | ||
metrics: &'static Metrics, | ||
) -> Result<(), response::CheckTx> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have noted it for RemovalReason
, but it also applies here: all the error-to-response conversions could be made a lot neater by adding a trait IntoCheckTxResponse
and moving the conversions there instead of having them inlined in the function body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Wrote the trait and defined if for RemovalReason
and InsertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, neat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks neat, thank you for your work.
I have made some notes on how you can now use your newly defined ContainedTxLock
to avoid accidentally holding on to the lock for too long (which is an actual benefit of the design!).
Please also have a look at how composer (and bridge-withdrawer) might be affected by the nonce-related ABCI codes. Maybe create issues for followup tasks.
Happy to merge this after updating the PR description with respect to my last review.
@@ -52,6 +55,13 @@ impl AbciErrorCode { | |||
} | |||
Self::LOWER_NONCE_INVALIDATED => "lower nonce was invalidated in mempool".into(), | |||
Self::BAD_REQUEST => "the request payload was malformed".into(), | |||
Self::ALREADY_PRESENT => "the transaction is already present in the mempool".into(), | |||
Self::NONCE_TAKEN => "there is already a transaction with the same nonce for the \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if NONCE_TAKEN
and LOWER_NONCE_INVALIDATED
would have any repercussions on composer? Right now composer acts on INVALID_NONCE
, refetching the latest nonce and then resubmitting at the latest nonce + 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. The NONCE_TAKEN
error case would case the bundle factory to not resend when it should. The ALREADY_PRESENT
one isn't an error, it means that the bundle is already inside of the mempool. I should change those to reflect these changes probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked issue: #1633
let mut contained_lock = self.lock_contained_txs().await; | ||
contained_lock.remove(tx_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your new ContainedTxLock
type, you can now do something very neat that avoids holding on to the lock at all:
self.lock_contained_txs()
.await
.remove(tx_id);
This means that the lock is dropped at the end of the statement since you never bind it anywhere.
@@ -1000,4 +1090,65 @@ mod tests { | |||
"first removal reason should be presenved" | |||
); | |||
} | |||
|
|||
#[tokio::test] | |||
async fn tx_tracked_edge_cases() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I now already have a better understanding what this is for. Ideally each edge case gets its own test (even at the cost of more code) so that it's immediately clear what you are testing.
state: &S, | ||
signed_tx: SignedTransaction, | ||
metrics: &'static Metrics, | ||
) -> Result<(), response::CheckTx> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, neat.
app::StateReadExt as _, | ||
bridge::StateReadExt as _, | ||
ibc::StateReadExt as _, | ||
}; | ||
|
||
#[instrument(skip_all)] | ||
pub(crate) async fn check_nonce_mempool<S: StateRead>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like for all of these to migrate over to the mempool at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created an issue #1632
…sion (#1515) ## Summary Previously `handle_check_tx()`'s would always run all stateless checks and attempt insertion. This PR modified the function to only run the checks on transactions that have not been removed or are not contained inside of the app side mempool. This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used to enable seeing if a transaction is contained in the Mempool without having to decode it into a `SignedTransaction`. This will be kept in sync with unit tests which check if the length of the mempool (which is now calculated with the HashSet) matches the expected state of mutating the underlying containers. We considered using just a toggle on `CheckTX::New` and `CheckTX::Recheck` instead of the HashSet, but there is the possibility of CometBFT persisting transactions on restart which could make use of toggling on the app mempool knowing the transaction instead. TBD, but it's also useful to be able to query the mempool if it contains a transaction. This PR also restores the original behavior of `CheckTx` to not kick out transaction needlessly on rechecks, this regression was added when upgrading the mempool in #1323. Pre #1323, re-inserting the same transaction into the mempool would not cause an error. Post #1323, the re-insertion would cause an error. We now check to make sure that the transaction is not inside of the mempool before attempting to re-insert it. ## Background A lot of the functionality from the original `handle_check_tx()` has moved into the app side mempool itself, including the balance and nonce checks. All of the remaining checks will not change on `CheckTx::Recheck` and only need to be ran when considering a transaction for insertion into the appside mempool. ## Testing Ran locally and added unit tests to ensure future nonces can be added and rechecks do not remove transactions needlessly. ## Metrics - Deleted `check_tx_removed_stale_nonce` - Added `check_tx_duration_seconds_check_tracked`, `mempool_tx_logic_error` - Changed `check_tx_duration_seconds_check_nonce` to `check_tx_duration_seconds_fetch_nonce` ## Breaking Changes Added new ABCI error codes for transaction insertion errors that would be useful to surface to the end user: `ALREADY_PRESENT`, `NONCE_TAKEN`, `ACCOUNT_SIZE_LIMIT`. ## Related Issues closes #1481 --------- Co-authored-by: Fraser Hutchison <[email protected]>
## Summary Updates composer to work with the new appside mempool. New ABCI error codes were added and now nonces can be stacked, so the submission logic should use the GRPC `pending_nonce()` instead of the RPC `get_latest_nonce()`. ## Background New ABCI error codes were added in PR #1515 which need to be handled in the composer's submission logic. New Error codes: - `NONCE_TAKEN` -> try resubmitting with new top-of-pending. If this is happening because the composer is out of funds, that is okay as the composer has it's own fund monitoring logic - `PARKED_FULL` -> normal error case ok - `ACCOUNT_SIZE_LIMIT` -> normal error case ok - `ALREADY_PRESENT` -> normal error case ok, shouldn't be returned ## Background We changed the internals of the mempool which resulted in the users of the mempool to have changes as well. ## Testing Smoke tests ## Related Issues closes #1633 --------- Co-authored-by: quasystaty <[email protected]>
Summary
Previously
handle_check_tx()
's would always run all stateless checks and attempt insertion. This PR modified the function to only run the checks on transactions that have not been removed or are not contained inside of the app side mempool.This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used to enable seeing if a transaction is contained in the Mempool without having to decode it into a
SignedTransaction
. This will be kept in sync with unit tests which check if the length of the mempool (which is now calculated with the HashSet) matches the expected state of mutating the underlying containers.We considered using just a toggle on
CheckTX::New
andCheckTX::Recheck
instead of the HashSet, but there is the possibility of CometBFT persisting transactions on restart which could make use of toggling on the app mempool knowing the transaction instead. TBD, but it's also useful to be able to query the mempool if it contains a transaction.This PR also restores the original behavior of
CheckTx
to not kick out transaction needlessly on rechecks, this regression was added when upgrading the mempool in #1323. Pre #1323, re-inserting the same transaction into the mempool would not cause an error. Post #1323, the re-insertion would cause an error. We now check to make sure that the transaction is not inside of the mempool before attempting to re-insert it.Background
A lot of the functionality from the original
handle_check_tx()
has moved into the app side mempool itself, including the balance and nonce checks. All of the remaining checks will not change onCheckTx::Recheck
and only need to be ran when considering a transaction for insertion into the appside mempool.Testing
Ran locally and added unit tests to ensure future nonces can be added and rechecks do not remove transactions needlessly.
Metrics
check_tx_removed_stale_nonce
check_tx_duration_seconds_check_tracked
,mempool_tx_logic_error
check_tx_duration_seconds_check_nonce
tocheck_tx_duration_seconds_fetch_nonce
Breaking Changes
Added new ABCI error codes for transaction insertion errors that would be useful to surface to the end user:
ALREADY_PRESENT
,NONCE_TAKEN
,ACCOUNT_SIZE_LIMIT
.Related Issues
closes #1481